Headerless, Quoteless, but not Hopeless? Using Pairwise Email Classification to Disentangle Email Threads

نویسندگان

  • Emily Jamison
  • Iryna Gurevych
چکیده

Thread disentanglement is the task of separating out conversations whose thread structure is implicit, distorted, or lost. In this paper, we perform email thread disentanglement through pairwise classification, using text similarity measures on non-quoted texts in emails. We show that i) content text similarity metrics outperform style and structure text similarity metrics in both a class-balanced and class-imbalanced setting, and ii) although feature performance is dependent on the semantic similarity of the corpus, content features are still effective even when controlling for semantic similarity. We make available the Enron Threads Corpus, a newly-extracted corpus of 70,178 multiemail threads with emails from the Enron Email Corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Email Folder Classification using Threads

While automatic classification of email is obviously a useful task to study, it is not obvious how to best utilize the rich metadata specific to email to improve the quality of the classification. In this paper, we propose a simple algorithm for using email threads to improve the precision of a personal email assistant’s automatic folder classification. We evaluate the approach on a large email...

متن کامل

A High Capacity Email Steganography Scheme using Dictionary

The main objective of steganography is to conceal a secret message within a cover-media in such a way that only the original receiver can discern the presence of the hidden message. The cover-media can be a text, email, audio, image, and video, which can be transmitted through a public channel, such as the Internet. By extending the use of email among Internet users, the provision of email steg...

متن کامل

Summarizing Email Threads

Summarizing threads of email is different from summarizing other types of written communication as it has an inherent dialog structure. We present initial research which shows that sentence extraction techniques can work for email threads as well, but profit from email-specific features. In addition, the presentation of the summary should take into account the dialogic structure of email commun...

متن کامل

Why Forwarded Email Threads are Hard to Read: The Email Format as an Antecedent of Email Overload

Research has shown that excessive email use leads to feelings of being overwhelmed and stressed. Existing coping solutions, which mitigate email overload, address the number of emails and, in consequence, the time spent on emails. These approaches are congruent with existing research on antecedents of email overload. Further coping solutions include addressing email threads. However, we lack a ...

متن کامل

Naval Postgraduate School Monterey , California Thesis a Study of Topic and Topic Change in Conversational Threads

This thesis applies Latent Dirichlet Allocation (LDA) to the problem of topic and topic change in conversational threads using e-mail. We demonstrate that LDA can be used to successfully classify raw e-mail messages with threads to which they belong, and compare the results with those for processed threads, where quoted and reply text have been removed. Raw thread classification performs better...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013